[Chapter 8] Exploring the Ames housing data

[DSLC stages]: EDA

In this document, you will find the PCS workflow and code for conducting a thorough EDA of the Ames housing data. Note that each section in this document corresponds to an interesting trend and finding. We did not include every exploratory avenue and every exploratory plot we made in this document.

Following each interesting figure that we explore in this document, we conduct a PCS evaluation to demonstrate the stability, predictability of the take-away message of the figure.

We examined and cleaned the Ames housing data in the file 01_cleaning.qmd. In each subsequent file that uses the cleaned version of the data, it is good practice to load in the original “raw” (uncleaned) data, and then clean it and pre-process it using the cleaning function you wrote. It is often helpful to keep a copy of the original uncleaned data in your environment too.

Note that our pre-processing steps were primarily so that the data would play nice with the predictive algorithms. In general the initial clean data is useful for EDA (sometimes the pre-processed data is too, but we will focus on the clean data for now) to ensure that your perspective is not skewed by the pre-processing steps (such as imputation), but it is also helpful to explore the pre-processed data too since this is the data you will be using in your analysis. You will see us examine both datasets in this document.

library(tidyverse)
library(janitor)
library(lubridate)
library(cluster)
library(fossil)
library(superheat)
# if library(superheat) doesn't work, you might first need to run:
# library(devtools)
# install_github("rlbarter/superheat")

source("functions/prepareAmesData.R")
# list all objects (and custom functions) that exist in our environment
ls()
 [1] "ames"                    "ames_orig"              
 [3] "ames_test"               "ames_test_clean"        
 [5] "ames_test_preprocessed"  "ames_train"             
 [7] "ames_train_clean"        "ames_train_preprocessed"
 [9] "ames_val"                "ames_val_clean"         
[11] "ames_val_preprocessed"   "cleanAmesData"          
[13] "preProcessAmesData"      "split_date"             
[15] "train_neighborhoods"    

1 High-level summary of the data

Here are some histograms of the numeric variables (which we already saw in the cleaning doc).

ames_train_clean %>%
  select_if(is.numeric) %>%
  select_if(~n_distinct(.) > 20) %>% 
  pivot_longer(everything(), names_to = "variable") %>%
  ggplot() +
  geom_histogram(aes(x = value), col = "white") +
  facet_wrap(~variable, scales = "free", ncol = 3)
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 230 rows containing non-finite values (`stat_bin()`).

The following table shows a random sample of 15 houses (this won’t necessarily match the random sample that was in the book).

set.seed(28674)
ames_train_clean %>% sample_n(15) 
order pid ms_sub_class ms_zoning lot_frontage lot_area street alley lot_shape land_contour utilities lot_config land_slope neighborhood condition_1 condition_2 bldg_type house_style overall_qual overall_cond year_built year_remod_add roof_style roof_matl exterior_1st exterior_2nd mas_vnr_type mas_vnr_area exter_qual exter_cond foundation bsmt_qual bsmt_cond bsmt_exposure bsmt_fin_type_1 bsmt_fin_sf_1 bsmt_fin_type_2 bsmt_fin_sf_2 bsmt_unf_sf total_bsmt_sf heating heating_qc central_air electrical x1st_flr_sf x2nd_flr_sf low_qual_fin_sf gr_liv_area bsmt_full_bath bsmt_half_bath full_bath half_bath bedroom_abv_gr kitchen_abv_gr kitchen_qual tot_rms_abv_grd functional fireplaces fireplace_qu garage_type garage_yr_blt garage_finish garage_cars garage_area garage_qual garage_cond paved_drive wood_deck_sf open_porch_sf enclosed_porch x3ssn_porch screen_porch pool_area pool_qc fence misc_feature misc_val date mo_sold yr_sold sale_type sale_price
2638 902101050 50 RM 60 9600 Pave Grvl Reg Lvl AllPub Inside Gtl OldTown Norm Norm 1Fam 1.5Fin 5 6 1948 1948 Gable CompShg MetalSd MetalSd Stone 264 TA TA CBlock TA TA No Rec 276 Unf 0 936 1212 GasA Gd Y FuseA 1226 442 0 1668 1 0 1 0 3 1 TA 6 Typ 0 NA Detchd 1948 Unf 1 240 TA TA Y 0 0 140 0 0 0 NA NA NA 0 2006-05-01 5 2006 WD 135000
2469 528488120 20 FV 62 7500 Pave Pave Reg Lvl AllPub Inside Gtl Somerst Norm Norm 1Fam 1Story 7 5 2005 2005 Gable CompShg VinylSd VinylSd None 0 Gd TA PConc Gd TA No Unf 0 Unf 0 1257 1257 GasA Ex Y SBrkr 1266 0 0 1266 0 0 2 0 3 1 Gd 6 Typ 1 TA Attchd 2005 Unf 2 453 TA TA Y 38 144 0 0 0 0 NA NA NA 0 2006-04-01 4 2006 WD 176000
1964 535453080 20 RL NA 7500 Pave NA Reg Lvl AllPub Inside Gtl NAmes Norm Norm 1Fam 1Story 5 7 1959 2003 Hip CompShg BrkFace BrkFace None 0 TA TA CBlock TA TA No GLQ 574 Unf 0 466 1040 GasA Ex Y SBrkr 1040 0 0 1040 1 0 1 0 3 1 Gd 6 Typ 0 NA Attchd 1959 RFn 1 286 TA TA Y 0 0 0 0 0 0 NA NA Shed 0 2007-07-01 7 2007 WD 152000
2509 533212120 160 FV 24 2544 Pave Pave Reg Lvl AllPub Inside Gtl Somerst Norm Norm Twnhs 2Story 7 5 2005 2005 Gable CompShg MetalSd MetalSd None 0 Gd TA PConc Gd TA No Unf 0 Unf 0 600 600 GasA Ex Y SBrkr 520 623 80 1223 0 0 2 1 2 1 Gd 4 Typ 0 NA Detchd 2005 RFn 2 480 TA TA Y 0 166 0 0 0 0 NA NA NA 0 2006-07-01 7 2006 WD 149900
2066 905225020 60 RL 99 16779 Pave NA Reg Lvl AllPub Inside Gtl Sawyer Feedr Norm 1Fam 2Story 5 4 1920 1996 Gable CompShg Wd Sdng Wd Sdng BrkFace 356 TA Fa CBlock Gd TA No BLQ 267 Unf 0 404 671 GasA Fa Y SBrkr 1567 1087 0 2654 0 0 3 0 4 1 TA 11 Mod 1 Gd Attchd 1946 Unf 2 638 TA TA Y 128 570 0 0 0 0 NA NA Shed 500 2007-05-01 5 2007 WD 158000
2042 903475060 190 RM 60 10120 Pave NA IR1 Bnk AllPub Inside Gtl OldTown Feedr Norm 2fmCon 2.5Unf 7 4 1910 1910 Hip CompShg Wd Sdng Wd Sdng None 0 Fa TA CBlock TA TA No Unf 0 Unf 0 925 925 GasA TA N FuseF 964 925 0 1889 0 0 1 1 4 2 TA 9 Typ 1 Gd Detchd 1960 Unf 1 308 TA TA N 0 0 264 0 0 0 NA MnPrv NA 0 2007-01-01 1 2007 WD 122000
2170 908103090 20 RL 67 8308 Pave NA Reg Lvl AllPub Inside Gtl Edwards Norm Norm 1Fam 1Story 4 6 1963 1963 Gable CompShg VinylSd VinylSd Stone 20 TA Gd CBlock TA TA No BLQ 132 LwQ 841 115 1088 GasA TA Y SBrkr 1088 0 0 1088 0 0 1 0 2 1 TA 4 Typ 0 NA Detchd 2002 Unf 2 520 TA TA P 0 0 0 0 0 0 NA NA NA 0 2007-06-01 6 2007 COD 110000
2422 528228340 120 RL NA 3196 Pave NA Reg Lvl AllPub Inside Gtl Blmngtn Norm Norm TwnhsE 1Story 8 5 2003 2003 Gable CompShg VinylSd VinylSd BrkFace 40 Gd TA PConc Gd TA Gd Unf 0 Unf 0 1273 1273 GasA Ex Y SBrkr 1456 0 0 1456 0 0 2 0 2 1 Gd 7 Typ 1 TA Attchd 2003 Fin 2 400 TA TA Y 143 20 0 0 0 0 NA NA NA 0 2006-05-01 5 2006 WD 215000
2145 907250050 20 RL 80 10386 Pave NA Reg Lvl AllPub Inside Gtl CollgCr Norm Norm 1Fam 1Story 8 5 2004 2005 Gable CompShg CemntBd CmentBd Stone 246 Gd TA PConc Gd TA No GLQ 1464 Unf 0 536 2000 GasA Ex Y SBrkr 2000 0 0 2000 1 0 2 0 3 1 Gd 8 Typ 0 NA Attchd 2004 Fin 3 888 TA TA Y 168 0 0 0 0 0 NA NA NA 0 2007-07-01 7 2007 WD 305900
1186 534126090 20 RL 80 10400 Pave NA Reg Lvl AllPub Inside Gtl NWAmes Norm Norm 1Fam 1Story 7 6 1970 1970 Hip CompShg Plywood Plywood BrkFace 288 TA TA PConc TA TA No Unf 0 Unf 0 1304 1304 GasA Gd Y SBrkr 1682 0 0 1682 0 0 2 0 3 1 TA 7 Typ 1 Gd Attchd 1970 Unf 2 530 TA TA Y 98 0 0 0 0 0 NA MnPrv NA 0 2008-05-01 5 2008 WD 174000
2863 909280030 50 RL 86 11500 Pave NA IR1 Lvl AllPub Inside Gtl Crawfor Norm Norm 1Fam 1.5Fin 7 7 1936 1987 Gable CompShg BrkFace BrkFace None 0 Gd TA CBlock Gd TA No Rec 223 Unf 0 794 1017 GasA Gd Y SBrkr 1020 1037 0 2057 0 0 1 1 3 1 Gd 6 Typ 1 Gd Attchd 1936 Fin 1 180 Fa TA Y 0 0 0 0 322 0 NA NA NA 0 2006-06-01 6 2006 WD 250000
1310 902300260 75 RM 53 5350 Pave NA Reg Lvl AllPub Corner Gtl OldTown Artery Norm 1Fam 2Story 7 8 1920 1965 Gable CompShg Wd Sdng Wd Shng None 0 TA TA BrkTil TA TA No BLQ 116 Unf 0 508 624 GasA Ex Y SBrkr 730 720 0 1450 0 0 1 0 3 1 TA 7 Typ 0 NA Detchd 1935 Unf 1 288 TA TA Y 0 192 0 0 0 0 NA MnPrv NA 0 2008-03-01 3 2008 WD 132000
2444 528315070 60 RL 84 9660 Pave NA IR1 Lvl AllPub Inside Gtl NoRidge Norm Norm 1Fam 2Story 8 5 1998 1998 Gable CompShg VinylSd VinylSd BrkFace 242 Gd TA PConc Gd TA No GLQ 791 Unf 0 253 1044 GasA Ex Y SBrkr 1079 897 0 1976 1 0 2 1 3 1 Gd 7 Typ 1 Ex Attchd 1998 Fin 3 885 TA TA Y 210 31 0 0 0 0 NA NA NA 0 2006-02-01 2 2006 WD 286000
1484 907425030 120 RM NA 4438 Pave NA Reg Lvl AllPub Inside Gtl CollgCr Norm Norm TwnhsE 1Story 6 5 2004 2004 Gable CompShg VinylSd VinylSd BrkFace 205 Gd TA PConc Gd TA Av GLQ 662 Unf 0 186 848 GasA Ex Y SBrkr 848 0 0 848 1 0 1 0 1 1 Gd 3 Typ 1 Gd Attchd 2004 RFn 2 420 TA TA Y 149 0 0 0 0 0 NA NA NA 0 2008-01-01 1 2008 WD 149000
2141 907203060 20 RL 70 7700 Pave NA Reg Lvl AllPub Corner Gtl CollgCr Norm Norm 1Fam 1Story 5 5 1972 1972 Gable CompShg VinylSd VinylSd None 0 TA TA CBlock TA TA No LwQ 138 Rec 468 276 882 GasA TA Y SBrkr 882 0 0 882 1 0 1 0 3 1 TA 5 Typ 0 NA Detchd 1980 Unf 2 461 TA TA Y 96 0 0 0 0 0 NA MnPrv NA 0 2007-03-01 3 2007 WD 112500
set.seed(28674)
ames_train_preprocessed %>% sample_n(15) 
sale_price lot_frontage lot_area overall_qual overall_cond year_built year_remod_add mas_vnr_area exter_qual bsmt_qual bsmt_exposure total_bsmt_sf heating_qc gr_liv_area bedroom_abv_gr kitchen_qual tot_rms_abv_grd fireplaces fireplace_qu garage_yr_blt garage_finish garage_cars garage_area wood_deck_sf mo_sold yr_sold masonry_veneer_brick foundation_cinder foundation_concrete garage_attached lot_inside exterior_vinyl bathrooms porch residential_density irregular_lot_shape house_floors basement_finished_rating neighborhood_NAmes neighborhood_Gilbert neighborhood_NWAmes neighborhood_Somerst neighborhood_Sawyer neighborhood_BrkSide neighborhood_OldTown neighborhood_Edwards neighborhood_CollgCr neighborhood_Mitchel
135000 60 9600 5 6 1948 1948 264 3 3 1 1212 4 1668 3 3 6 0 0 1948 1 1 240 0 5 2006 0 1 0 0 1 0 2.0 1 2 0 1.5 4 0 0 0 0 0 0 1 0 0 0
176000 62 7500 7 5 2005 2005 0 4 4 1 1257 5 1266 3 4 6 1 3 2005 1 2 453 38 4 2006 0 0 1 1 1 1 2.0 1 0 0 1.0 1 0 0 0 1 0 0 0 0 0 0
152000 68 7500 5 7 1959 2003 0 3 3 1 1040 5 1040 3 4 6 0 0 1959 2 1 286 0 7 2007 0 1 0 1 1 0 2.0 0 1 0 1.0 6 1 0 0 0 0 0 0 0 0 0
149900 24 2544 7 5 2005 2005 0 4 4 1 600 5 1223 2 4 4 0 0 2005 2 2 480 0 7 2006 0 0 1 0 1 0 2.5 1 0 0 2.0 1 0 0 0 1 0 0 0 0 0 0
158000 99 16779 5 4 1920 1996 356 3 4 1 671 2 2654 4 3 11 1 4 1946 1 2 638 128 5 2007 1 1 0 1 1 0 3.0 1 1 0 2.0 3 0 0 0 0 1 0 0 0 0 0
122000 60 10120 7 4 1910 1910 0 2 3 1 925 3 1889 4 3 9 1 4 1960 1 1 308 0 1 2007 0 1 0 0 1 0 1.5 1 2 1 2.5 1 0 0 0 0 0 0 1 0 0 0
110000 67 8308 4 6 1963 1963 20 3 3 1 1088 3 1088 2 3 4 0 0 2002 1 2 520 0 6 2007 0 1 0 0 1 1 1.0 0 1 0 1.0 3 0 0 0 0 0 0 0 1 0 0
215000 68 3196 8 5 2003 2003 40 4 4 4 1273 5 1456 2 4 7 1 3 2003 3 2 400 143 5 2006 1 0 1 1 1 1 2.0 1 1 0 1.0 1 0 0 0 0 0 0 0 0 0 0
305900 80 10386 8 5 2004 2005 246 4 4 1 2000 5 2000 3 4 8 0 0 2004 3 3 888 168 7 2007 0 0 1 1 1 0 3.0 0 1 0 1.0 6 0 0 0 0 0 0 0 0 1 0
174000 80 10400 7 6 1970 1970 288 3 3 1 1304 4 1682 3 3 7 1 4 1970 1 2 530 98 5 2008 1 0 1 1 1 0 2.0 0 1 0 1.0 1 0 0 1 0 0 0 0 0 0 0
250000 86 11500 7 7 1936 1987 0 4 4 1 1017 4 2057 3 4 6 1 4 1936 3 1 180 0 6 2006 0 1 0 1 1 0 1.5 1 1 1 1.5 4 0 0 0 0 0 0 0 0 0 0
132000 53 5350 7 8 1920 1965 0 3 3 1 624 5 1450 3 3 7 0 0 1935 1 1 288 0 3 2008 0 0 0 0 0 0 1.0 1 2 0 2.0 3 0 0 0 0 0 0 1 0 0 0
286000 84 9660 8 5 1998 1998 242 4 4 1 1044 5 1976 3 4 7 1 5 1998 3 3 885 210 2 2006 1 0 1 1 1 1 3.5 1 1 1 2.0 6 0 0 0 0 0 0 0 0 0 0
149000 68 4438 6 5 2004 2004 205 4 4 3 848 5 848 1 4 3 1 4 2004 2 2 420 149 1 2008 1 0 1 1 1 1 2.0 0 2 0 1.0 6 0 0 0 0 0 0 0 0 1 0
112500 70 7700 5 5 1972 1972 0 3 3 1 882 3 882 3 3 5 0 0 1980 1 2 461 96 3 2007 0 1 0 0 0 1 2.0 0 1 0 1.0 2 0 0 0 0 0 0 0 0 1 0

1.1 Correlation matrix

The heatmap below shows the correlation relationship between the continuous numeric variables.

ames_train_preprocessed %>%
  select_if(is.numeric) %>%
  select_if(~n_distinct(.) > 20) %>% 
  cor %>%
  superheat(heat.pal = c("white", "#18678B", "black"),
            heat.pal.values = c(0, 0.7, 1), 
            pretty.order.rows = TRUE, 
            pretty.order.cols = TRUE, 
            grid.hline.col = "white",
            grid.vline.col = "white", 
            bottom.label.text.angle = 90, 
            bottom.label.size = 0.5)

What we can see is that there is a strong correlation between the gr_liv_area and sale_price (response) variable, as well as several other area-related variables (garage_area, total_bsmt_sf), and that the year-related variables are highly correlated (garage_yr_blt, year_built).

2 Exploring the response (sale price)

Since our goal for this project is to predict sale price, let’s take a closer look at the sale price variable.

The distribution looks fairly clean, although it is skewed by a couple of particularly expensive houses.

ames_train_clean %>%
  ggplot() +
  geom_histogram(aes(x = sale_price))
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

One option that we explore in pre-processing is log-transforming the response variable. The log-transformed sale price variable is indeed a lot more symmetric (which sometimes can improve prediction performance):

ames_train_clean %>%
  ggplot() +
  geom_histogram(aes(x = log(sale_price)))
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

3 Realtionship with the response (sale price)

Let’s examine the relationship of several variables with sale price.

First, let’s compare the continuous variables with sale price using scatterplots:

ames_train_clean %>%
  select_if(is.numeric) %>%
  select(-any_of(c("order", "pid", "misc_val",
                   "ms_sub_class"))) %>%
  select_if(~n_distinct(.) > 20) %>%
  pivot_longer(-c(sale_price), names_to = "variable", values_to = "value") %>%
  ggplot() +
  geom_point(aes(x = value, y = sale_price), alpha = 0.4) +
  facet_wrap(~variable, scales = "free", ncol = 4)
Warning: Removed 230 rows containing missing values (`geom_point()`).

The variables that leap out as being heavily related to living area are gr_liv_area, and the other area variables (x1st_flr_sf, tot_bsmt_sf). Several of these are removed in the “simplifying” pre-processing option though.

We can also physically compute the correlation of each numeric variable with sale price and plot it as bars to quantify these observations. This time, we will look at all the pre-processed variables:

cor_df <- cor(select_if(ames_train_preprocessed, is.numeric))[, "sale_price"] %>%
  enframe %>%
  dplyr::filter(!(name %in% c("sale_price_binary", "sale_price", "pid"))) %>%
  arrange(value) %>%
  mutate(name = fct_inorder(name)) 
cor_df %>% ggplot() +
  geom_col(aes(x = name, y = value), fill = "grey80", col = "white") +
  # negative names
  geom_text(aes(x = name, y = -0.005, label = name), 
            hjust = 1, vjust = 0.5, size = 3,
            data = dplyr::filter(cor_df, value < 0),
            col = "grey30") +
  # positive names
  geom_text(aes(x = name, y = 0.005, label = name), 
            hjust = 0, vjust = 0.5, size = 3,
            data = dplyr::filter(cor_df, value > 0),
            col = "grey30") +
  geom_hline(yintercept = c(-0.5, 0, 0.5), col = "grey70") +
  scale_y_continuous("Correlation with sale price",
                     limits = c(-0.5, 0.8)) +
  scale_x_discrete(NULL) +
  theme_classic() +
  theme(#axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5),
    axis.line.y = element_blank(),
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank(),
    axis.line.x = element_line(color = "grey90"),
    axis.ticks.x = element_line(color = "grey90")) +
  coord_flip()

Next, we can look at whether there is a relationship with the discrete numeric variables using boxplots:

ames_train_clean %>%
  select_if(is.numeric) %>%
  select_if(~n_distinct(.) < 13) %>%
  select(-x3ssn_porch, -pool_area) %>%
  mutate(sale_price = ames_train_clean$sale_price) %>%
  pivot_longer(-c(sale_price), names_to = "variable", values_to = "value") %>%
  ggplot() +
  geom_boxplot(aes(x = value, group = value, y = sale_price), alpha = 0.5) +
  facet_wrap(~variable, scales = "free", ncol = 4)
Warning: Removed 2 rows containing missing values (`stat_boxplot()`).

The overall_qual variable stands out as having a strong relationship with the sale price, but it doesn’t look like a linear relationship. Perhaps it looks more linear with the log-transformed sale price variable?

3.1 Log-transformed sale price

Let’s reproduce these plots, but with the log-transformed sale price response variable.

ames_train_clean %>%
  select_if(is.numeric) %>%
  select(-order, -pid, -misc_val,
         -ms_sub_class) %>%
  select_if(~n_distinct(.) > 20) %>%
  mutate(log_sale_price = log(sale_price)) %>%
  select(-sale_price) %>%
  pivot_longer(-c(log_sale_price), names_to = "variable", values_to = "value") %>%
  ggplot() +
  geom_point(aes(x = value, y = log_sale_price), alpha = 0.4) +
  facet_wrap(~variable, scales = "free", ncol = 4)
Warning: Removed 230 rows containing missing values (`geom_point()`).

The linear relationships for the log-transformed sale price variable look even stronger now, and so too does the relationship between log(sale price) and overall_qual below:

ames_train_clean %>%
  select_if(is.numeric) %>%
  select_if(~n_distinct(.) < 13) %>%
  select(-x3ssn_porch, -pool_area) %>%
  mutate(log_sale_price = log(ames_train_clean$sale_price)) %>%
  pivot_longer(-c(log_sale_price), names_to = "variable", values_to = "value") %>%
  ggplot() +
  geom_boxplot(aes(x = value, group = value, y = log_sale_price), alpha = 0.5) +
  facet_wrap(~variable, scales = "free", ncol = 4)
Warning: Removed 2 rows containing missing values (`stat_boxplot()`).

4 Comparing neighborhoods

It might be interesting to see how the neighborhoods compare to one another. Below we print a map of Ames to provide some context.

Figure 1: A map showing where the neighborhoods of Ames are located.

This map was taken from the Tidy Modeling with R book by Max Kuhn and Julia Silge, who also provide a predictive analysis of this dataset. The data that we have does not contain the latitude and longitude information, but they seemed to have a version that did!

The center of the map which contains no houses corresponds to the university of Iowa.

The boxplots below compare the sale price distribution across the neighborhoods.

ames_train_clean %>%
  group_by(neighborhood) %>%
  mutate(mean_sale_price = mean(sale_price)) %>%
  ungroup() %>%
  arrange(desc(mean_sale_price)) %>%
  mutate(neighborhood = fct_inorder(neighborhood)) %>%
  ggplot() +
  geom_boxplot(aes(x = neighborhood, y = sale_price)) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))

Below we examine the size (gr_liv_area) for each neighborhood. It seems that not only are NoRidge and NridgHt the most expensive neighborhoods (above), they are also the neighborhoods with the largest houses:

ames_train_clean %>%
  group_by(neighborhood) %>%
  mutate(mean_gr_liv_area = mean(gr_liv_area)) %>%
  ungroup() %>%
  arrange(desc(mean_gr_liv_area)) %>%
  mutate(neighborhood = fct_inorder(neighborhood)) %>%
  ggplot() +
  geom_boxplot(aes(x = neighborhood, y = gr_liv_area)) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))